AITopics | mask prompt

Collaborating Authors

mask prompt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3da292ced54290c19fc55d9dba3da793-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 18:32:11 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

AIMS: All-Inclusive Multi-Level Segmentation for Anything

Neural Information Processing SystemsApr-26-2026, 18:32:03 GMT

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for differentlevel region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with some semantic relationships). We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation. Specifically, we propose task complementarity, association, and prompt mask encoder for three-level predictions. Extensive experiments demonstrate the effectiveness and generalization capacity of our method compared to other state-of-the-art methods on a single dataset or the concurrent work on segment anything. We will make our code and training model publicly available.

machine learning, natural language, segmentation, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

3da292ced54290c19fc55d9dba3da793-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 18:44:08 GMT

dataset, mask prompt, prediction result, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Hubei Province > Wuhan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

3da292ced54290c19fc55d9dba3da793-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 18:44:04 GMT

machine learning, natural language, segmentation, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Industry: Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain

Yoon, Hangyul, Jang, Doohyuk, Kim, Jungeun, Yang, Eunho

arXiv.org Artificial IntelligenceNov-25-2024

Leveraging pre-trained models with tailored prompts for in-context learning has proven highly effective in NLP tasks. Building on this success, recent studies have applied a similar approach to the Segment Anything Model (SAM) within a ``one-shot" framework, where only a single reference image and its label are employed. However, these methods face limitations in the medical domain, primarily due to SAM's essential requirement for visual prompts and the over-reliance on pixel similarity for generating them. This dependency may lead to (1) inaccurate prompt generation and (2) clustering of point prompts, resulting in suboptimal outcomes. To address these challenges, we introduce \textbf{Med-PerSAM}, a novel and straightforward one-shot framework designed for the medical domain. Med-PerSAM uses only visual prompt engineering and eliminates the need for additional training of the pretrained SAM or human intervention, owing to our novel automated prompt generation process. By integrating our lightweight warping-based prompt tuning model with SAM, we enable the extraction and iterative refinement of visual prompts, enhancing the performance of the pre-trained SAM. This advancement is particularly meaningful in the medical domain, where creating visual prompts poses notable challenges for individuals lacking medical expertise. Our model outperforms various foundational models and previous SAM-based approaches across diverse 2D medical imaging datasets.

dataset, point prompt, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2411.16123

Country:

Asia > China > Guangdong Province > Shenzhen (0.06)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

Liang, Feng, Wu, Bichen, Dai, Xiaoliang, Li, Kunpeng, Zhao, Yinan, Zhang, Hang, Zhang, Peizhao, Vajda, Peter, Marculescu, Diana

arXiv.org Artificial IntelligenceApr-1-2023

Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-trained vision-language models, e.g., CLIP, to classify masked regions. We identify the performance bottleneck of this paradigm to be the pre-trained CLIP model, since it does not perform well on masked images. To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions. We collect training data by mining an existing image-caption dataset (e.g., COCO Captions), using CLIP to match masked image regions to nouns in the image captions. Compared with the more precise and manually annotated segmentation labels with fixed classes (e.g., COCO-Stuff), we find our noisy but diverse dataset can better retain CLIP's generalization ability. Along with finetuning the entire model, we utilize the "blank" areas in masked images using a method we dub mask prompt tuning. Experiments demonstrate mask prompt tuning brings significant improvement without modifying any weights of CLIP, and it can further improve a fully finetuned model. In particular, when trained on COCO and evaluated on ADE20K-150, our best model achieves 29.6% mIoU, which is +8.5% higher than the previous state-of-the-art. For the first time, open-vocabulary generalist models match the performance of supervised specialist models in 2017 without dataset-specific adaptations.

category, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.0415

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback